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automatic processing of personal names for filing 


Foster M. PALMER: Associate University Librarian, Harvard University 
Library, Cambridge, Massachusetts 


Describes a method for preparing personal names already in machine read- 
able form for processing by any standard computer sort program , determin- 
ing filing order insofar as possible from normally available information 
rather than from special formating. Prefix recognition is emphasized; multi- 
word forename entries are a problem area. Provision is made for an edit 
list of problems requiring human decision. Possible extension of the method 
to titles is discussed. 


This paper describes a method of computerized filing of personal names 
for display in book catalogs or other lists intended for direct human 
consultation. The problem is to be distinguished from a related but dif- 
ferent one: computerized storage for retrieval by means of a search key, 
in which machine rather than human convenience can determine the order. 

To the extent that filing is a purely mechanistic sorting process, it is 
ideally suited to computerization. However, it was early recognized that 
there "are many possible complications in machine filing of library entries, 
even in the relatively straightforward area of personal names. Some of these 
complications arise from such factors as upper-case codes, diacritic codes 
and punctuation; others are the result of library rules or practices that call 
for departures from strict alphabetical order. While the latter are especia y 
numerous in subject headings and titles, they affect names as well, tor 
example, the custom of filing Me as if Mac. 
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While no general review of the literature on machine filing will be 
attempted here, attention will be called to selected contributions. Nugent 
(1) described an approach to computerizing the Library of Congress filing 
rules and pointed out areas where the present rules do not lend themselves 
to mechanization. Cartwright and Sholfner (2) discussed four major ways 
of approaching a solution to the problem and concluded that a mixture 
of different methods would eventually be required. In a later publication A, 
Cartwright (3) developed his ideas further and included a brief description 
of the present writer’s then unpublished work. The principal monograph 
on the subject is that by Hines and Harris (4). They present a suggested NTT: 
filing code departing significantly from those in widespread use and 
propose that material be encoded in a certain fashion so that it will be 
ready for computer sorting. In particular, considerable dependence is 
placed on distinctions between single, double, and multiple blanks separat- 
ing words or fields. In a recent paper, Harris and Hines restate their rules 
briefly and report on their later research (5). 

The present paper describes a different, virtually an opposite, approach. 

Rather than relying on special formating of the material at the time of 
encoding, the system described herein attempts to derive the necessary 
filing information from normally formated material. Historically, it grew 
out of a desire to construct improved indexes for use at the Harvard 
University Library to the body of records distributed by the MARC Pilot 
Project, in which there were field indicators and a limited number of 
delimiters within fields, but a general absence of information added 
expressly for the purpose of filing. 

While some early work embraced both personal names and titles, it 
was soon apparent that names by themselves presented a considerable 
challenge, and further consideration of the even more difficult areas of 
titles, corporate entries, and subject entries was deferred. A few comments 
on the possible applicability of the general method to titles will be made 
later. 

The concrete form which the work eventually took was an Autocoder 
macro instruction for a second generation computer, an IBM 1401. (A 
macro instruction is a means of calling forth by means of a single instruction 
a more extensive routine already worked out and placed in the system 
“library.”) Since the 1401 was a fairly small computer, it was important 
that the algorithm not require an excessive number of instructions, and 
since the internal speed of the machine was only moderate, it was also 
imoortant that processing be direct and economical. The method used, 
however, is by no means limited to a particular computer or a particular 
language, A partial version of the algorithm has been written in ADPAG, 
as an exercise in the evaluation of that language, and run on an IBM 360-65 
using MARC II test data. 

The system is based on examination of names ( previously identified as 
such by appropriate tags) and development of parallel sort keys consisting 
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only of letters, numerals, and blanks, readily processable by any standard 
compu er sor package designed for alphanumeric information. The only 
i equ trements are that blank sort low and that the letters A-Z and the 
numerals 0-9 sort in their natural order; whether numbers are considered 
higher or lower than letters does not matter. considered 

Processing starts at the beginning of the name and proceeds until one 
taree conditions prevails: The number of characters examined is eomd 
ni um mngth or the field as specified in the record; the number of characters 

vahfe °o P f e 40 m o " Tl ey i b" n eaChed & ?P ecified ^ point or the defau 
value of 40; or a delimiter indicating the end of the name, or the end of 

the name proper, is encountered (a search being then made beyond the 

dehmitei ror a date, which, if found, is added to the sort key). ' 

lhe sort key is derived by transferring letters (or, in the case of a date 
numbers) from the source, with occasional modifications as described 
elow, and inserting one of four filing codes at the end of each word or 

codes 6 bTih' In eady W ° rk ’ Single Sp6Cial characters were used 
fil . ° > cod f> but this was inappropriate as a general solution since the 

ding order of these characters depended on the collating sequence peculiar 

invnl P f r r COr u P ut ter i [r ' lrthermore > it was inconvenient because it 
w"i g 7 aI1 , bknks 1° som ething else, since a blank within a name 
with its implication of something to follow should not file as low as what- 

code'i-lt^t I" Very K nd °i , the , name : T}i e idea of using a two-character 
code, the first always being blank so that any filing code will file ahead 

i°n ali y Ute ter deri nf d fr0m Nu § ent (*) “d has been followed 

m all later work. Only three filing codes were actually used in compiling 

indexes to the MARC I tapes, and in the first description privately ch 
culated by the author (6). However, at least four are now seen to be 
necessary, actual need to distinguish the second and third not yet having 
been encountered but being possible: y 8 

Code (blank 


\ 

followed by: ) 
3 

5 

6 
7 


Placement 

The end of the name including date if any. 

Between the name proper and a date. 

The end of the surname. 

The end of any other “word” of the name. 

(A word is any element followed by a blank, 
hyphen, comma, or period, except that prefixes 
which are identified as such are not considered 
separate words. ) 

r , - exarn ples illustrate che use of the codes and the mineral 
k'«dSa°L MAVirTf hl w "7' '“T «“(**• ,he llift »»i>a column 

(on,! % eiu“; 

zz i;'tr^Lr d the risht M 


i 1 1 f ; ! to j. low in 


Approved For Release 2001/11/20 : CIA-RDP79M00097A00020001 0003-7 





Approved For Release 2001/11/20 : CIA-RDP79M00097A000200010003-7 

188 Journal of Library Automation Vol. 4/4 December, 1971 


Arthur 

Arthur, Joseph 
Arthur, Joseph,-— 1875- 
Arthur, Joseph Charles 
Arthur-Behenna, K. 
Arthur-Petr 2 os, Gabriele Maria 
Wilson, William 
Wilson, William,— 1923- 
Wilson, William Lyne 
Wilson-Browne, A. E. 


arthur 3 

arthur Ojoseph 3 
arthur Gjoseph 51875 3 
arthur 6joseph 7charles 3 
arthur 7behenna 6k 3 
arthur 7petros 6gabriele 7maria 3 
wilson 6 william 3 
wiison dwilliam 51923 3 
wilson 6william 7iyne 3 
wilson 7browne 6a 7e 3 


Ihe use of the numbers 3, 5, 6, and 7 is arbitrary to a degree. An interval 
was left between 3 and 5 so that the end of name code could be changed 
to 4 if the name were a subject rather than a main or added entry. No 
extra interval to accommodate added entry as distinguished from main 
entry was left because the author did not wish to encourage what he 
regards as an unwise practice. However, those who insist may easily 
substitute a new series of codes allowing for it. 

The distinction between end of name and end of surname serves to 
bring simple forename entries, that is those consisting of a single word, 
e.g. Sophocles, ahead of similar surnames, c.g. Sophocles, Evangelinus 
Apostolides. No serious work has yet been undertaken on the problem of 
processing complex forenames, but the distinctive tagging of forenames 
in MARC II has made available a growing body of experimental data and 
the codes .1 (and 2 for subject) are reserved for possible future use in 
this connection, without any intent of prejudging the question whether 
complex forename entries should come before similar surnames. It is the 
view of the author that the filing of complex forename entries is one of 
the areas in which all librarians are on most uncertain grounds in assessing 
the preference and convenience of readers. 

In handling such entries as Alexander, Mrs., or Maurice, Sister, the 
algorithm depends on the presence of a delimiter before Mrs. or Sister 
to avoid filing after Alexander, Milton or Maurice, Robert. Such delimiters 
were in fact present in the MARC Pilot Project data. Despite the limitations 
mentioned in dealing with multiple-word forename entries and with sur- 
names lacking forenames, the algorithm is well suited to names in the 
normal modern pattern, namely a simple or compound surname followed 
by a comma and one or more given names or initials. Furthermore, very 
specifically, it deals with prefix names. Prefixes with apostrophes are taken 
care of by a general dropping out of apostrophes and other non-significant 
punctuation: 

[LTsIe, Guillaume de] lisle 6guillaume 7de 3 

O’Brian, Robert Enlow obrian 6robert 7enlow 3 

The same feature also handles such names as the following: 

Prud’homme, Louis Arthur prudhornme Slouis 7arthur 3 

Ta’JBois, Boland tabois 6rokmd 3 


mo 
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-Nlost prefixes, however, are dealt with by a specific search based on 
cramming the fust letter of each new “word” of the name. If the element 
beg, ns with A B, D, E, F, I, L, M, O, S, T, V, or Z, a branch is made to 
a prefix searching routine tailor-made for the particular letter. Taking 
names beginning with L as an example, if the second character is “e ” “a ” 
or o a prefix may be present; otherwise the prefix search is discontinued. 
!r '• 11 searching and tire third character is a blank or a hynhen, a prefix 

f? ua § ed to b _ e P, reseQt Jhe letters “le,” “la ” or “lo” are moved to the 
sort key output field. Ihree input and two output characters are counted 
effectively skipping over the blank or hyphen. Similarly, if the third 
character is an s followed by a blank or a hyphen, “les,” “los ” or “las” 

is moved with a count of four input and three output. Otherwise there 
is no prefix. 

laplace Gpierre 7antoine 7de 3 
lascases Gphilippe 7de 3 
lefanu Gjoseph 7sheridan 3 
lopresti bsalvatore 3 


La Place, Pierre Antoine cle 
Las Cases, Philippe de 
Le Fanu, Joseph Sheridan 
Lo Presti, Salvatore 


Routines for other letters, similar in approach but varying in detail, produce 
sunilar results: ’ r 

Degli Antoni, Carlo degliantoni 6carlo 3 

?. e Boche, Mazo delaroche 6mazo 3 

lutz Gibbon Constantine fitzgibbon 6constantine 3 

Van der Bijl, Hendrick Johannes vanderbijl Ghendrick 7johannes 3 

i he search for prefixes and quasi-prefixes is not limited to the first surname. 
<- is and quite plainly should be extended to given names: 

Bundy, McGeorge bundy Gmacgeorge 3 

Bundy, Mary Lee bundy 6mary 71ee 3 

Whether it should be extended to later elements of compound surnames 
is problematical. Bowing to the fact that filing is as much an art as a science 
111 P ractic f a compromise was reached: the prefix search was extended to 
compounds, except when the prefix of the succeeding element begins with 
, ■ L he exception was made to accommodate the large number of Hispanic 
names m this pattern, since it seemed clearly preferable to file all the 
names beginning Perez de” before any of those beginning “Perez del” 

1 -erez, Joaqirin p ere z 6joaquin 3 

f erez de Urbel, Justo perez 7de 7urbel 6]usto 3 

1 erez del Castillo, Jos 2 e perez 7del 7castillo 6jose 3 


'rez Gald-os, Benito 


perez 7galdos 6benito 3 


, apS ; ski PP in S Prefix treatment in subsequent elements should have 
G; ‘ the rather than the exception; but an exception would then 

required for Me, "St.,’' and perhaps others. 

Prefixes and quasi-prefixes sought for is given in Table 1. 
a m some cases the result is considered doubtful, and a special 
v ;t in u Set Ju suc n -situations the program can then set another signal 
' - ; die macro and reprocess the name using alternate rules. 
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Table 

1. List of 

Prefixes , Etc 

., Found by 

Special Search 

A 

1, 4, 7 

Den 


St. 4, 15 

A 

2, 4 

Der 

4, 11, 18 

Ste. 18 

Ab 


Des 


Te 4, .11 

al 

5 

Di 


Ten 4 

Al 

3, 4, 6 

Do 


Ter 

An 

4, 7 

Dos 

4, 11 

The L 4, 3 

Ap 


Du 


Van 1, 17 

At 


el 

5 

Van 2, 4, 12, 

Aus 

17 

El 

3, 4, 6 

Van . . .4, 9 

Airs’ . . 

.4, 9 

Fitz 


Vande 

Bar 

10 

Im 


Vanden 

Bat 

10 

In 

17 

Vander 

Ben 

10 

La 


Ver 

Da 


Las 


Von 17 

Das 

4, 12 

Le 


Vonde 

De 

17 

Les 


Vonden 

Degli 

1 

Lo 


Vonder 

Dei 


Los 


Z 4, 5 

Del 


M’ 

4, 14 

Zu 17 

Della 


Mac 


Zum 

Delle 


Me 

13 

Zur 

Delta 


O 




17 


o, 

4. 

5. 

6 . 
7. 


1. Only when followed by blank. 

2. Only when followed by hyphen. 

Only when upper case. 

“Doubt" signal is set. 

Bypassed, i.e. dropped out and disregarded. 

Bypassed if “alternate” signal is on. 

Bypassed unless “alternate” signal is on. 

8. Bypassed if first word. 

9. Aus’m and Van’t are closed up to “ausm” and “vant” by the general 
dropping of apostrophes but no attempt is made at further special 
processing since their rarity would not justify the necessary elaboration 
of the algorithm. 

10. Not treated as prefix if special parameter is present. 

11. Not treated as prefix if “alternate” signal is on. 

12. Not treated as prefix unless “alternate” signal is on. 

13. Expanded to “mac”. 

14. Expanded to “mac” unless “alternate” signal is on. 

15. Expanded to “saint”. 

16. Expanded to “sainte”. 

17. Another prefix may follow, as in De La. 

18. Previous notes do not apply when preceded by Van or Von. 
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Diacritical marks on other than the iirst letter, or capitalization beyond 
the normal, such as all caps., would prevent proper processing. Except as 
indicated, lower case is included along with upper, and prefixes followed 
bv a hyphen are treated the same as those followed by a blank. The 
MARC I corpus included several names with hyphenated prefixes, and 
fortuitously a method was available with the 1401 for giving the hyphen 
search almost a ‘free ride” along with that for the blank. Since the code 
for hyphen was a single bit, the so-called B bit, and a blank was represented 
by no bits, a “branch if bit equal” instruction specifying all the other bits, 
A, 8, 4, 2, and 1, would branch if any character other than blank or hyphen 
was present. Implementations for other machines may have to devote a 
disproportionate number of instructions to the search for the rare hyphen- 
ated prefixes, or else risk missing them. 

No doubt some other prefixes could be added to the list. “Ua, for 
example, was considered but not included in the actual working macro 
after examination of a catalog of five million cards showed that only two 
beginning with these two letters were not for the prefix. The increase in 
processing time involved in adding another initial letter to the list of those 
looked for did not seem to be justified. 

In the program employing the macro for production of an index to 
names in the MARC Pilot Project data, whenever the “doubt” signal was 
set, the name was printed on an edit list for human inspection. The name 
was then reprocessed with the “alternate” signal set and if a different output 
form was developed, this form also was printed. If the person reviewing the 
list accepted the first form, no special action was necessary. If the second 
was preferred, a card with an identifying number and the code 2 was 
punched; if a hand-made form was needed, this form was entered on a 
card with the code 3. These cards and the original output tape were then 
used to produce an edited output tape, in which the alternate forms were 
dropped unless a card directed otherwise. A second printed listing, re- 
cording the action taken, was also produced. 

The doubtful cases identified by the algorithm are not limited to the 
prefix problems described above. By far the commonest occasion for doubt 
was the presence of “a,” “o,” or “u.” Was it a Germanic umlaut, calling for 
translation for filing purposes to “ae,” “oe,” or “ue,” or was it something 
else? This is not the place to debate the practice, followed in most American 
academic libraries, of filing urnlauted letters as if spelled out with an “e.” 
The major bibliographies covering the German book trade do so, but most 
German dictionaries and encyclopedias do not; the example of other 
reference works and indexes is mixed. Since the aim of the work described 
here was to produce an index of names that could be used comfortably 
by librarians used to the practice, a means of continuing it was sought. 
However, it would be manifestly improper to insert an “c” if the mark 
were a diaeresis rather than an umlaut; and, in the opinion of the writer, 
almost equally improper for Hungarian, Finnish, and Turkish vowels. Even 
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nose wno do ale such vowels in these languages as if they were Germanic 
do not usually do so tor Chinese. It should be noted here that not all 

3°;“ °J s i )e , cl T al ett f s turn on doubt signal. “A” is routinely 
translated to aa and Icelandic thorn to “th ” 

Other occasions for signalling doubt include names with a suspiciously 
lugh number ol words before the first comma. This provision was introduced 
m an attempt to catch some non-names in the original data which had been 
wrongly coded, e.g. Women’s Association of the St. Louis Symphony. When 

found, a card with the code D was punched for the edit run to delete these ' 
entries entirely. 

Statistics of processing for the entire corpus of MARC Pilot Project data 
as cumulated and to some slight degree edited at the Harvard University 
Library will be useful m seeing the edit list in proper perspective. The 
entne file consisted of 47,884 records, 4,285 of which lacked names. The 
remaining 43 o99 records contained 55,288 names (or alleged names). 
Of these 5_,b/2 or 94.7% were judged to be purely routine. Special pro- 
cessing of some sort not involving doubt (e.g., recognition of compound 

J™1 eXP fi T° n °* , MC ’ “ MaC >” d0sin § «P of apostrophe ornon- 
doubtful prefix) was performed on 2,283 names, or 4.1%. The total number 
of doub fu! names printed on the edit list was 631, or 1.1%. Somewhat more 
JfJf of tnese ( 334 ) resulted m different forms on being reprocessed 
with the alternate signal on. In 562 of the 631 doubtful cases, or 89% of 
this group, the first or only form printed was accepted, so that no action 

«nn° n c ,iI nSP t Ct ! 0n Wa f necessar y- ° n, y 69 names, or not quite one out of 
oUO ot the whole number, required the punching of a card-47 to indicate 
choice of the second form, 14 supplying a hand-made form, and 8 calling 
for deletion of non-names. Subsequent changes in the macro would have 
reduced considerably the number of names requiring hand-made forms. 

It will be instructive to examine some of the names from the edit list 
to see what types of problems arise. The first selection of actual consecutive 
names (from LC card number 66-15363 through 66-17297) is rather typical: 
Barnard, Douglas St. Paul barnard 6douglas 7saint 7paul 3 

xLkel or, Gunnar,= 1907- ekeloef Ggunnar 51907 3 

, or: ekelof 6gunnar 51907 3 

Woolley, A1 E. woolley 6al 7e 3 

Sch 4 onfeld, Walther H. P.,=1888- schoenfeld 6walther 7h 7p 51888 3 

or: schonfeld 6 walther 7h 7p 51888 3 
jaenner 6michael 3 
janner 6michael 3 
mueller 6alois 51924 3 
mu Her Galois 51924 3 
huang 6yuean 7shan 3 
huang 6yuan 7shan 3 
mueller Gkurt 51903 3 
or: muller Gkurt 51903 3 


J 4 anner, Michael 
Mhrller, Alois, = 1924- 
Huang, Yhxan-shan 
Mhdler, Kurt ,= 1903 


or: 


or: 


or: 
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Note the dominance of simple umlauts; also, as a curiosity, the fact that 
all persons named “Al” appear on the list because of the possibility that 
it might be an unhyphenated Arabic prefix. Note also that Saint is treated 
as a separate word, not closed up as a prefix. “St.” was originally put on 
the doubtful list with the thought that it might stand for Sankt or Szent 
instead of Saint, although normal library practice would not use an abbre- 
viation in such cases. Its inclusion on the doubtful list was unexpectedly 
justified, however, by the occurrence of the name Erlich, Vera St. It seems 
likely that in this case “St.” may stand for a patronymic, perhaps Stojanova 
or Stefanova, and there may be other occasions on which St. rather than 
S. is used as an abbreviation for such a name as Stefan (cf. the French 
use of Ch. rather than simple C. as an abbreviation for Charles). 

The only action required for the names in the list above would be to 
punch a “2” card for the Chinese name Huang, Yuan-shan. Indeed, just 
as the umlaut is the largest category on the edit list, so the non-umlaut — 
a diacritic that looks like an umlaut but does not call for insertion of “e” — 
is the commonest occasion for punching an exception card. Occasionally a 
diaeresis is found: 

Lecointe du No 4 uy, Pierre lecomte 7du 7nouey 6pxerre 3 

or: lecomte 7du 7nouy dpierre 3 

More common are certain front vowels in Hungarian, Finnish, or Turkish, 
or the vowel ii in Chinese as already encountered: 


F‘oldi, Mih 2 aly 

foeldi Gmihaly 3 
or: foldi Gmihaly 3 

T'olgyessy, Juraj 

toelgyessy 6juraj 3 
or: tolgyessy 8 juraj 3 

MetfaPa-Portin, Raija 

mettaelae 7portin Graija 
or: mettala 7portin 6raija 3 

N 4 arv 4 anen, Sakari 

naervaenen 6sakari 3 
or: narvanen fisakari 3 

In 4 on 4 u, E. 

inoenue 6e 3 
or: inonu 6e 3 

S 4 umer, Mine 

suemer 6mine 3 
or: sumer 6mine 3 

Y'u, Ying-shih 

yue 6ying 7shih 3 
or: yu 6ying 7shih 3 


Some libraries avoid the problem by treating all but the last of these as 
if umlauted, but determination of the correct category can usually be made 
at sight. Occasionally a name gives pause, for example these two which 
bo ill prove to be Swiss and presumably Germanic, although Chonz may be 


i mans h: 
Ch'onz, 


Selina 


H’uede, Thomas 


or: 


or: 


choenz Gselina 3 
chonz Gselina 3 
rueede dtliomas 3 
ruede dthomas 3 


fmm 
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Somewhat more troublesome arc 
are Germanic: 

Vogt, Ulya ( G'oknil ) 


names where some but not all elements 


vogt 6ulya 7goeknil 3 


Ouchterlony, 4 Orjan 
Iv~anyi-Gr 4 unwald, B 2 ela 


or: 


or: 


or : 


vogt 6ulya 7goknil a 
ouchterlony 6oerjan 3 
ouchterlony 8orjan 3 
ivanyi 7gruenwald 6bela 3 
ivanyi 7grunwald 6bela 3 
Although Vogt is ooviously Germanic, Ulya Goknil is equally obviously 
not, and therefore the decision is that no umlaut is present. Orjan, on the 
other hand, is a Scandanavian forename, to be treated as umlauted even 
though coupled with a surname of Scottish Gaelic origin. Bela Ivanyi- 
Griinwald is a more difficult case. Griinwald is of course Germanic in 
origin, but can it be regarded as Magyarized? In English we might assume 
that^ such a name is Anglicized when the bearer starts writing it Grunwald 
or Gruenwald. However, the case is not so clear in Hungarian, since that 
language also has the letter “ii.” Discussion of such a point may seem to 
split hairs, but it does involve a significant difference between manual and 
machine systems. In a manual system, the question of whether to file as 
Iv anyi-Grunwald or as Ivanyi-Gruenwald would arise only in the exceed- 
ingly unlikely event that another name which would file between the two 
also occurred in the corpus. In a machine system, however, any difference, 
even this late in a distinctive name, could result in the various works of 
the author being misfiled among themselves, or a work about him filed 
before one by h im. 

Use of different codes to represent the same graphic, umlaut on the 
one hand or diaeresis or other non-umlaut on the other, would drasticallv 
reduce both the number of doubtful names and the number of those for 
which an exception procedure is required. The Harvard College Library 
actually follows this practice. The Library of Congress experimented with 
it, but found that catalogers were reluctant in some cases to make the 
decision. Contemplation of the case of Bela Ivanyi-Griinwald gives the 
author more sympathy with this reluctance than he originally felt. 

In attempting to evaluate the method described above, one must acknowl- 
edge both strong points and limitations. On the one hand it is very gratifying 
to see AEsopus and [Aesopus] falling together despite differences in the 
capitalization of the “e” and the bracketing, and to find such sequences 
as the following, all without even being referred to the edit list under the 
rules then prevailing: 

aziz Okhursheed Tkamai 3 
aziz 7ahmad 3 
azm fisadik 7j 3 
azrael 8jeremy 7r 3 
ba 7maw 6u 3 
baab 6clarence 7theodore 3 


Aziz, Ivliursheed Xamal 
Aziz Ahmad 
al-Azm, Saclik J. 

Azrael, Jeremy R. 

Ba Maw, U 

Baab, Clarence Theodore 
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Delgado, David J. 

Del Grande, John Joseph 
Delhora, Louis A. 

Delieb, Eric 
DeLise, Knoxie C. 

De Lisser, R. Lionel 
Dell, Ralph Bishop 
Dellinger, Dave 
DelFIsola, Frank 
Del Mar, Alexander 
Delmar, Anton 

Delmar-Morgan, Edward Locker 


delgado Gdavid 7j 3 
dclgrande Gjohn tjoseph 3 
delhom 6louis 7a 3 
delieb 6eric 3 
delise Gknoxie 7c 3 
delissei 8r 71ionel 3 
dell Gralph Tbishop 3 
deilinger 6dave 3 
dellisola Gfrank 3 
delmar Galexander 3 
delmar Canton 3 

delmar 7morgan 6edvvard 71ocker 3 


While it is certainly true that the system cannot survive without some 
provision for referring doubtful questions to a human editor, the number 
of these depends to a considerable extent on the filing and coding policies 
followed. Provided forename entries are coded as such, the system does 
a good job of identifying possible problems. (Presently, all multiple word 
forename entries are considered doubtful.) “Ua” has already been cited 
as an example of a prefix deliberately omitted, and there are others which 
could be added at any time it is thought worth while. A more troublesome 
situation, pointed out by Kelley Cartwright, is the possible occurrence of 
“Van” as a non-final element of an unhyphenated Vietnamese name. The 
only way this could be prevented from misfiling by merging it with the 
next element would be to throw all “Vans” including the numerous ones 
of Dutch origin into the doubtful category, expanding the edit list more 
than twenty percent. This did not seem advisable, particularly since normal 
usage is to hyphenate Vietnamese compound names. 


can 


Up to this point the evaluation is quite favorable. The system 
correctly process a very large proportion of names, including some which 
involve quite sophisticated points, without reference to a human editor, 
and it can call virtually all the rest to the attention of an editor. However, 
human review of problems means that there will be occasions when border- 
line cases are decided in different ways. If a permanent machine file of 
all established forms of names in the system is kept, both forms of each 
doubtful name could be checked against it so that decisions already made 
would not have to be repeated, thus saving the time of the editor as well 
as the hazard of differing decisions. It would of course be very expensive 
to keep such a file just for this purpose, but a file of this type would probably 
sor r n a part of a comprehensive mechanized bibliographic system anyway. 

Another area in which a mixed report would have to be given to the 
system is its extensibility to types of headings other than names. In work 
conducted on the same principles with a few thousand early titles from 
the MARC Pilot Project, there were only two conspicuous problems, one 
of which may not in fact be a problem: the filing of numbers as such rather 
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taan as if they were spelled out in the language of the title. True the 

dlfFerin^Ienlrt 1 * 1 ? * “l use , did . “ ot P rovide for bringing numbers of 
V p ,, r ., ? , mto °s icai order ( d 0 great ghost stories" before “200 

refinement a, TTie < ofli ^ft 8 “ hat ** * * readily attainable 

nement. The other problem is more refractory and is exemplified by titles 

begummg wuh prefix names, for example “De Gaulle,” “De Soto/’ and 

‘ n jogh. Names within titles could not receive the usual name treat- 

the n preE were E U ° Wa/ °! iden ? yi -^ them as * uch > ^ d therefore 
p .1 ' “ vere * dfcd as separace words, f urthermore, while MARC Pilot 

in English tT W f fe qUite a C T' Sr c r! ° politan lot > the titIes were almost entirely 
Tbp ^ J J' T pl ref0re ’ re T Vai of lnitial articles was not much of a problem 7 
lere did not happen to be any work beginning “A to Z of . . However 

there was a book which, although in English and so coded, had a title 
begmmng with a Spanish article: “La vida,” by the late Oscar Lewis In 
working toward automatic removal of initial articles from titles the usual 

Sd wmbe ch V Z Ch r^ C0CUn8 ° f thG IangUa § e ° f the WOTk i>s available 
and wifi be checked first. This seems desirable both because it is probably 

more efficient in machine time than to check every title against a Iona list 

of possible articles m many languages, and because words that are articles 

ini Hof “ llp^ 86 """I ?! ? CeSSariI y 50 in anot her. Most occurrences of 
initial die are probably German articles, but some are other parts of 

speech in English, for example “Die Casting” or “Die like a Dog . 

,rH f r / he ] U ? Iailt i S the comm “ problem in names, the initial indefinite 

wefi be th 1Ch ^ l T Same aS the numeral “ 0Iie ” in several languages may 
ell be the most frequent occasion for doubt in processing of titles “Un 

“One^tn b r? “ A ” t0 ? dr ° Pped; but wil1 sometimes' mean 

Une, to be kept. There are certainly other problems, in addition to the 

one with prefix names already mentioned, including some that give trouble 
even in manual filing: Charles the First,” “Charles II,” “Charles V et son 
temps. It may be that at some point in the cataloging process a reviser 
wall have to be on the lookout for certain of these Ipecill situations and 
add flags to indicate that a title includes a prefix name, or that it begins 
wi h an article which would not be found by program, or that it doesliot 
begin with an article although it appears to do so, or that for some other 
reason it calls for a hand made key. 

The system described is not an absolute system, but absolute systems 
have their own tyrannies. If, as the author believes, Cartwright and 
, io rier (-) are correct in thinking that a mixture of methods will be 
required in actual book catalog projects, then a system along the lines of 
tne one described may well be a useful oart of the mix 
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